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Abstract. Focusing on the discrete probabilistic setting we generalize the 
combinatorial definition of cumulants to L-cumulants. This generalization 
keeps all the desired properties of the classical cumulants like semi-invariancc 
and vanishing for independent blocks of random variables. These properties 
make L-cumulants useful for the algebraic analysis of statistical models. We 
illustrate this for general Markov models and hidden Markov processes in the 
case when the hidden process is binary. The main motivation of this work is to 
understand cumulant-like coordinates in algebraic statistics and to give a more 
insightful explanation why tree cumulants give such an elegant description of 
binary hidden tree models. Moreover, we argue that L-cumulants can be used 
in the analysis of certain classical algebraic varieties. 



1. Introduction 

Although moments provide a convenient summary of properties of a probability 
distribution, it was observed that these properties can generally be described in a 
simpler way using cumulants (see for example [2, Section 2.4], [11, Chapter 2]). This 
is mainly because cumulants have the ability to capture symmetries and underlying 
independencies of a probability distribution. These striking features of cumulants 
make them an interesting object of statistical study both from a theoretical and 
practical point of view. In addition, as it was shown for example in [5, 16, 18], 
cumulants and moments can be used to analyze the geometry of statistical models. 

Recently, in [27] we have suggested using a less standard system of coordinates 
which we called tree cumulants. This new coordinate system proved to be useful 
to analyze Bayesian networks on trees when some of the nodes are not observed. 
Various results on identifiability and geometry of these models have been obtained 
in [25, 26, 27], which encouraged us to study more general coordinate systems like 
that. In the present paper we propose a useful generalization of both cumulants 
and tree cumulants. 

We work in a simple probabilistic setting. Let X = (Xi, . . . ,X n ) be a random 
vector such that each Xj takes > 2 possible values, where each is finite. The 
vector X takes values in a finite discrete set X = nlLi — ^™ sucn that = r i 
for i = 1, . . . , n. Without loss of generality we set 

X = {0,...,n-l} x ••• x {0,...,r„-l}. 

Any probability distribution of X can be written as a point P = \p(x)] £ M. x such 
that p(x) > for all x £ X and Yl x ex p( x ) = 1- The se * °f &n such points is called 
the probability simplex and it is denoted by Ax- 
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For any function / : X — > K the expectation of f(X) is given by 
E[f(X)} := 5>0r)/(a0. 

Let [n] := {1, . . . , n} and for any multiset A = {ii, . . . , id} of elements of [n] let 

Xa — ■ ■ ■ ,X id ). 

In a similar way we define xa — {x^ , . . . , Xi d ) and = Xi ± x • ■ ■ x Ai d . For each 
such a multiset A we define the corresponding moment 

HA = E[X h ---X id ] 

and the central moment 

fjf A = E[(X n - mJ • ■ ■ (X id -MiJ]- 

Our convention is to write /i^ as Hii—i d > where i\ < ■ ■ ■ < id- So for example if 
A = {1,2,4,4,4}, the corresponding moment is written as /ii2444 = EfXiJ^Xj]. 
The same convention applies to central moments. In particular, for every i < j, f/^ 
is the covariance between Xi and Xj. 

To show how cumulants can be naturally generalized we first define them formally 
and then we discuss their basic properties. Cumulants are usually computed using 
the cumulant generating function, which is defined as the logarithm of the moment 
generating function. In this paper we use an alternative definition of cumulants 
using partitions (see for example [11, 15, 19]). We say that w = B\\ . . . \B% is a 
partition (or a set partition) of [n], if the blocks Bi ^ are disjoint sets whose 
union is [n]. A partition is called a split if it consists of two blocks. Let n([n]) be 
the set of all set partitions of [n] . The cumulant of the vector X is defined as 

(1) h... n = £ (-ljw-HH-ijiIlMfl, 

7ren([ri]) Ben 

where the sum is over all set partitions of [n], the product is over all blocks of a 
partition and \tt\ denotes the number of blocks of 7r. For example, if n = 3 then 
there are five partitions in II([3]): 123, 1|23, 2|13, 12|3 and 1|2|3 and (1) gives 

(2) k 123 = ^123 ~ MlA*23 - M2M13 - M12M3 + 2AilM2^3- 

Equation (1) can be generalized for any multiset A — . . . , id} of elements of [n] 
to obtain the cumulant of Xa- Wc use the bijcction between A and [d] and write 

(3) k A = (-^'"'(M-^n^B. 

7ren([d]) BGir 

where is — {ij ■ j € B}. Hence for instance 

^112 = M112 - 2^1^12 - M11M2 + 2/rj> 2 - 
For each x = {x\, . . . , x n ) 6 X define a multiset A(x) as 

(4) A(x) = {l,...,l,...,n,...,n} 

X\ times x n times 

and let A(X) = {A(x) : x G X}. By the moment aliasing principle (see [12, 
Lemma 3]) there exists a polynomial isomorphism between P = [p{x)] x ^x and two 
other systems of coordinates of R^*) ~ given by moments M = [tiA(x)]xex 
and by cumulants K — [kA{x)]xex- In particular, every model M. C A^, after a 
change of coordinates, can be equivalently expressed in terms of M or K. 
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In our discussion of cumulants the central concept is that of independence. Let 
B C [n] and define the function TL XB on X by TL XB (X) = 1 if Xg = ig and 
\ Xb {X) — otherwise. By ps denote the marginal distribution of Xb defined by 

Pb{xb) = E[% XB (X)] for every ijj 6 Xb- 

For any two disjoint subsets I, J C [n] we say that Xj and Xj are independent, 
which we denote by IALJ (or XjALXj), if and only if 

Piuj{xiuj) = Pi{xi)pj{xj) for all x £ X. 

The following formulation of independence in terms of moments will be helpful. 

Lemma 1.1. We have I JLJ for some disjoint sets I, J C [n] if and only if 

Maub = MaMb for all nonempty A £ A(Xj), B £ A(Xj), 

where -4(A"/) = {A(x) : x £ Xi). 

Proof. We use an alternative definition of independence (see [4, page 136]) which 
states that Xj and Xj are independent if and only if for any two i 2 -functions / , g 
we have 

E[f(X I )g(Xj)}=E[f(X I )}E[g(X J )}. 

Now the 'if direction of the lemma is immediate. The 'only if direction uses the 
fact that the set of values of X is discrete and finite. In this case any function of 
X is a polynomial function (can be represented as a polynomial in the entries of 
X), where the terms of these polynomials are IlieA -^i f° r au ^ e A(X). Thus, to 
check ii IALJ, it remains to check if 

E[f(Xj)g(Xj)] = E[/(X J )]E[ 5 (X J )] 

for all polynomials /, g such that each / has only terms IlieA -^i f° r an nonempty 
A e A(Xj) and g has only terms Ilies^ ^ or an ^ e A(Xj). By expanding the 
terms of / and g it suffices to check that this property holds for each monomial, 
which is true by assumption. □ □ 

Example 1.2. Let m = 2, r x = 2 and r 2 = 3. Then X = {0, 1} x {0, 1, 2} and 

^1(^) = {0,{2},{2,2},{1},{1,2},{1,2,2}}. 

Since A(X X ) = {0, {1}} and A{X 2 ) = {0, {2}, {2, 2}}, by Lemma 1.1, we have 1 JL2 
if and only if = Mi^2, ^122 = ^1/^22, where /ii 2 2 = E[XiX|]. 

Lemma 1.1 generalizes and we have 7i_LL/2-LL • ■ ■ lLJ r for some disjoint sets 
I\ , . . . , I r C [n] if and only if 

r 

(5) MAi— A r = n^A ; , for all A4 e A{X h ), i = 1, . . . ,r, 

i=l 

where Ai ■ ■ ■ A r is a shorthand notation for U ■ ■ • U A r . 

Cumulants satisfy the following four basic properties, which make them useful 
for statistical modelling. 

(PI) Whenever there exists a split of the set of indices [n] of X into two block 

A\B such that AJLB then fe x ...„ = 0. 
(P2) For any a £ E" define X = X + a and for any multiset A by &a denote the 

corresponding cumulant of Xa- Then fc, = fej + dj for every i = 1, . . . ,n, 

and fc^ = fc^ whenever \A\ > 2. 
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(P3) Let Q = [q i3 ] e W nxn , X e E" and let X = QX E W". Define k A 
as the cumulant of Xa, where A is a multiset of elements of [to]. Let 
— [fcjj...^] denote the (n x • • ■ x n)-tensor indexed by all multisets of 
elements of [n] of size d > 1; and let = [fc^.-.j^] be the (to x • ■ ■ x m)- 
tensor indexed by all multisets of elements of [m] . Then, 

K {d) = Q-K {d \ for every d>l 

where for every multiset . . . , i^} of elements of [to]: 

n n 

(Q ■ K i d '■= ' ' ' fiij'i ' ' ' Qidd^h—jd- 

jl = l jd = l 

In other words cumulants under linear mappings transform as contravariant 
tensors. 

(P4) For two random vectors X , Y of dimension n denote by kA(X), fc^Y") and 
kA(X + Y) the cumulants of X, Y and X + Y respectively. If XALY then 
kA{X + Y) = kA(X) + kA(Y) for every multiset A of elements of [n]. 
In this paper we generalize cumulants by changing the set LT([n]) in (1) for other 
set partition lattices. The term (— 1)^ (\tt\ — 1)! in each summand of (1) is replaced 
by another function of tt which will be specified later. These generalized cumulants 
keep usually all properties (P1)-(P4) of classical cumulants. Also the Brillinger's 
conditional cumulants formula derived in [1] can be generalized under additional 
conditions. 

Different forms of cumulants are known to researchers in non-commutative prob- 
ability. For example free cumulants are used in the theory of random matrices 
[9, 20] and Boolean cumulants are applied to stochastic differential equations [10]. 
All those cumulants fall under our general definition. In Proposition 5.5 we show 
that central moments can be also represented as generalized cumulants. As an 
interesting implication we get a simple computationally efficient formula for cen- 
tral moments in terms of moments (see Lemma 5.6). The proof of this formula is 
straightforward. 

As it has been already pointed out in [23] , cumulants and cumulant-like quantities 
are also useful in algebraic geometry. The coordinate system given by cumulants has 
a number of useful properties. For example, the tangential variety Tan((P 1 )"), when 
expressed in binary cumulants, becomes toric. Also, the study of the secant variety 
Sec((P 1 )™) becomes easier when we change coordinates to binary tree cumulants. 
This happens because the induced parametrization in this new coordinate system 
becomes nearly monomial (see Section 3.3). 

There are two main reasons why cumulants can be successfully used in algebraic 
geometry and in the geometric study in statistics. First, many interesting algebraic 
varieties coincide with some statistical models. Second, the whole machinery of 
cumulants is purely algebraic in the sense that nonnegativity of probabilities does 
not play any role. In fact the only condition which we impose on probabilities is 
that they sum to one. For that reason the same techniques can be applied to any 
complex tensor with coordinates summing to one. This observation links our work 
to the theory of umbral calculus [14]. 

This paper is organized as follows. In Section 2 we introduce some basic concepts 
of the theory of partially ordered sets. In Section 3 we define binary i-cumulants, 
which form a rather straightforward generalization of binary cumulants introduced 
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in [2.3]. In Section 3.2 we present how binary L-cumulants may be used in algebraic 
geometry. This is then exemplified with a basic study of secant varieties in Section 
3.3. The general definition of L-cumulants is provided in Section 4. In Section 
5 we show that, under some mild conditions, all the basic properties (P1)-(P4) 
of classical cumulants hold also for L-cumulants. Moreover, in Section 5.3 we 
generalize the Brillinger's formula for cumulants in terms of conditional cumulants. 
In Section 6 we show how the results of this paper explain why tree cumulants 
work so well for tree models. We also provide a simple analysis of processes with an 
underlying hidden two-state Markov chain, which in particular gives a very simple 
parametrization of homogeneous binary hidden Markov models. 

2. Basic combinatorics 

In this section we introduce basic combinatorial concepts used later in the paper. 
For a more detailed treatment see [22]. Recall that 7r = Bi\ . . . \Bk is a partition 
of [n], if the blocks Bi ^ are disjoint sets whose union is [n\. Equivalently, a 
partition of [n] corresponds to an equivalence relation ~ w on [n] where i ^ j if i 
and j lie in the same block. Let now A be a multiset A = {i l5 . . . , i^} of elements 
of [n] . We define a partition tt of A using a partition tt of [d] by i j ~„. i^ if j ^ k 
in n([<i]). The set of all partitions of A is denoted by 11(A) and by definition it is 
isomorphic to LT([(i]). 

A partially ordered set V (or poset) is a set together with an ordering < such 
that: tt < tt for all tt E V; if tt < v and v < tt then tt = v\ and if tt < v and 
v < 5 then tt < 6 for all tt, 5 E V . A subposet of V is any subset of V with the 
same ordering. As an important example of a poset consider the set n([n]) with the 
poset structure given by refinement ordering such that tt < v in II ([n]) if and only 
if every block of tt is contained in a block of v. For instance let n = 5, tt = 13 1 4 1 25 
and v = 1235|4 then tt < v. 

We say that V has a if there exists an element € V such that tt > for all 
tt € V . Similarly, V has a 1 if there exists 1 E V such that tt < 1 for all tt E V . If 
tt and v belong to a poset T 3 , then an upper bound of tt and v is an element S E V 
satisfying 5 > tt and <5 > v. A least upper bound of tt and v is an upper bound S 
of tt and v such that every upper bound 7 of tt and v satisfies 7 > S. If a least 
upper bound of tt and v exists, then it is clearly unique and it is denoted by tt V v. 
Dually one can define the greatest lower bound tt A is when it exists. We call V the 
join operator and A the meet operator. 

A lattice is a poset L for which every pair of elements has a least upper bound 
and greatest lower bound. A sublattice of a lattice L is a nonempty subset of L 
which is a lattice with the same meet and join operations as L. Clearly all finite 
lattices have a and 1. In particular II ([n]) forms a lattice where the n-block 
partition 1|2| • • • \n is the 0, and the one-block partition [n] is the 1 of this lattice. 
A meet semilattice is a poset S for which every pair of elements has a least upper 
bound. A meet sub semilattice of S is a subposet of S which forms a meet semilattice 
with the same meet operator as S. Dually we define a join semilattice and a join 
sub semilattice. 

Definition 2.1. By a partition lattice of a set [n] we mean any lattice L which 
forms a subposet of n([n]) and both the one block partition [n] and the minimal 
partition 1|2| • • • \n he in L. 
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Note that we do not require that a partition lattice forms a sublattice of n([n]). 

Definition 2.2. The following is a list of interesting set partition lattices. 

(1) A partition tt £ Tl([n\) is non-crossing if there is no quadruple of elements 
i < j < k < I such that i ~„. k, j I and i o^ n j. The noncrossing 
partitions of [n] form a lattice which we denote by NC([n]). This lattice 
is not a sublattice of II([n]), however, it is a meet subsemilattice of II([n]) 
because the meet operators coincide. 

(2) An interval partition of [n] is a partition tt of a form 

1 ■ ■ • *i|(«x H- 1) • ■ ■ *al ■ • • |(*fc + 1) ■ • • n 

for some < k < n — 1 and 1 < i\ < . . . < i). < n — 1. The poset of 
all interval partitions is denoted by X([n}). It forms a sublattice of n([n]) 
isomorphic to the Boolean lattice of [n — 1] . 

(3) A partition tt 6 n([rt]) is called a one- cluster partition if it contains at most 
one block of size greater than one. In particular the one-block partition [n] 
and the minimal partition 1|2| ■ ■ ■ \n are one-cluster partitions. The poset of 
all one-cluster partitions forms a lattice C([n]), which is not a sublattice of 
II([n]). It is isomorphic to the poset of all subsets of [n] excluding singletons. 
It forms a meet subsemilattice of II([n]). 

(4) Let T = (V, E) be a fixed tree with set of nodes V, set of edges E and with 
n leaves labelled by [n] . Removing a subset of edges E' from E induces a 
forest. Restricting [n] to the connected components of this forest gives a 
tree partition tt induced by T. The set of all tree partitions induced by T 
is denoted by T T ([n]) and it forms a lattice which is a meet subsemilattice 
of n([n]). For an example of a tree and the induced lattice of partitions see 
Figure 1 (for n = 4) and Figure 2. 

For every poset V we define the Mobius function m-p -.VxV^Rby 

!1, if 7T = v , 

-H^<8<u m v{^^) if TT < ^, 
0, otherwise. 

When there is no ambiguity we usually drop V in the notation denoting the Mobius 
function on V by m. Note that directly from the definition in (6) 

(7) m ^ 5 ) = 

7T<(5<I/ 

A special type of a subposet of V is the interval 

[tt, v\ = {5 € V : n < S < u}, 

defined whenever tt < v. The Mobius function on this subposet is naturally induced 
from the Mobius function on V (see for example [13, Proposition 4]). For any two 
posets V\ , V2 we define the poset V\ x V2 as a set with the ordering (tt, v) < (tt' , v') 
if tt < tt' and v < lA The following result gives a convenient way of finding a 
Mobius function for posets constructed from other posets by taking products. 

Proposition 2.3 (Proposition 3.1.2, [22]). LetVi andVi be finite posets, and let 
V\ x Vi be their direct product. If (tt, v) < (tt' , v') in V\ x V2, then 

m VlxV2 ((TT,v), (tt',z/)) = m Vl (TT,v)m V2 (Tr' ,v'). 



JO if TT < V 
1 1 if 7T = v. 
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The Mobius function is especially useful due to the following result. 

Proposition 2.4 (Mobius inversion formula). Let V be a finite poset. Let f,g : 
V -> R. Then 

= /M' for aline V., 

V<7T 

if and only if 

/O) = 53 m ( jy ' 7r )3(i y ) for all n eV. 

U<7T 

For every lattice denote m(7r) := m(7r, 1). Later we will see that it is particularly 
important to identify values of m(7r) for various partition lattices. For n([n]) we 
have m(7r) = (— l)l*'l _1 (|7r| — 1)! The lattice of interval partitions is isomorphic 

to the Boolean lattice of all subsets of [n — 1] and hence m(7r) = (—l)> 7r > 1 . For the 
lattice of one-cluster partitions we have 

f (_i)«-i(„-i) ifrr = l|2|..>, and 
(8) m(?r) - \ (-l)M- 1 otherwise. 

For the other cases in Definition 2.2 the Mobius function can be computed recur- 
sively. 



3. Binary 7-cumulants 

In this section we discuss binary 7-cumulants which generalize binary cumulants 
of [23] . Most of the technical results will be stated without proofs, which will then 
be given in a more general context in later sections. 

3.1. Definition and basic facts. Assume that X — {0, 1}™, in which case A(X) 
is the set of all subsets of [n]. Let L C n([n]) be a partition lattice of [n]. For every 
I Q [n] consider L(L) as the subposet of n(/) obtained from L by constraining each 
partition to the subset /. The Mobius function on L(I) is also denoted by m unless 
it may lead to ambiguity in which case we write explicitly xxij. 

A multiplicative function on L(L) is any function such that for every 7r G L(I) 

/(tt) = f| fs for some f B 6 E. 

First consider the case when L = H([n}). For every / C [n] and v € 11(7) define 
(9) k{v) = ^2m{TT,v)n(Tr), 

where /x(7r) = YIbett I 13 ^ s a multiplicative function and the sum is taken over 
elements n of 11(7) such that tt < v. The one-block partition I is the unique 
maximal element of the lattice 11(7). The Mobius function on 11(7) satisfies m(7r) := 
m(7r,7) = (-l)^!- 1 ^! - 1)! for all rr e 11(7). It follows by (3) that fcj = fc(7) and 
hence (9) evaluated at v = 7 gives the definition of binary cumulants. 

To get the inverse formula for moments in terms of cumulants we need the 
following result. 

Lemma 3.1. For every v G 11(7) we have k(v) = Y\ Bev k B , where k(v) is defined 
by (9). 
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Proof. Note that every interval [n,v] C 11(7) is isomorphic to a product of inter- 
vals n Be „[n(B), B] C Y\ Be „II(B), where ir(B) denotes n constrained to elements 
in B C 7. By Proposition 2.3 a Mobius function on a product of posets is equal 
to the product of Mobius functions for each individual factor. Hence, (9) can be 
rewritten as 

= n ( e <mw)i = n 

bgis \sen(B) J Bev 

which finishes the proof. □ □ 

The inverse formula for moments in terms of cumulants follows directly by Propo- 
sition 2.4 and Lemma 3.1. For every 7 C [n] we have 

(10) w = e fc w = E Ii kB 

7ren(/) 7ren(/)Be7r 

We can directly generalize the definition of binary cumulants to binary L-cumulants. 
Let L be a partition lattice of [n]. Define binary L-cumulants by 

(11) ii = E rn(-7r) JJ^ ri B for every 7 C [n]. 

TveL(i) bgtt 

By definition for every 7 C [n] the maximal and minimal element of the lattice 7(7) 
coincide with the minimal and maximal element of n(7). In particular for every 
L we have ti = fa for i = 1, . . . , n\ and = faj — fafij for all 1 < i < j < n. 
However, already when n = 3 not all 7-cumulants coincide with cumulants. 

Example 3.2. Let n = 3 and consider L-cumulants induced by the lattice of 
interval partitions. The lattice Z([3]) has four elements: 123, 1 1 23 , 12 1 3 and 1|2|3 
and m(7r) = (— l)l 7r l _1 . Therefore, we have 

^123 = M123 ~ ^1^23 - + ^1^2/^3- 

Compare this with the formula for &123 in (2) to note that not only the term ^2/^13 
is missing now in the formula for i 123 but also the coefficient of fafJ.2fJ-3 is 1 not 2. 

Let 7r € n([n]) be a set partition into blocks B\, . . . , B r . Denote 

-U-Be-n-XB := Xsi-iL • " -iLXs,.- 

By (5), ALb^ttXb if and only if 

(12) \ii = A t ( 7r (^)) for every 7 C [n] , 

where 7r(7) denotes 7r constrained to elements in B. So for example the full inde- 
pendence is given by the minimal partition 7r = 1|2| • • • \n and fii = Hie/^* ^ or 
every 7 C [n] . 

Below we list the basic facts about binary L-cumulants. They are proved in a 
more general setting in Section 5. The following result implies that (PI) holds for 
binary L-cumulants. 

Proposition 3.3. There exists a partition 7r G L such that _LL bettoXb if and only 
if £(tt) = for all ir ^ ttq, or equivalently, if ii = unless I is contained in one of 
the blocks of ttq (equivalence follows from Theorem 5.2). 
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Proof. The 'if part of the proposition is given in a more general setting in 
Proposition 5.3. To prove the opposite implication use Theorem 5.2 to conclude 
that £(ir) = for all it j£ 7To implies that fij = (i(%o(I)) for all I C [n] which by 
(12) implies JL Be7ro X B - □ □ 

Example 3.4. Consider the situation of Example 3.2, where n = 3 and L-cumulants 
are defined by the lattice of interval partitions. If X\ _LL (X2, X3) then 11123 = /U1/X23, 
A*i2 = ^1^2 and H13 = /J-i/J-3- It follows that £12 = £13 = £123 = 0. On the other 
hand, the condition X2-LI_(Xi, X3) does not imply that ^123 = because in this 
case 

^123 = M2M13 - /"1M2M3, 

which is zero only when in addition /i 13 ~ /11/13 and hence when X1JLX3. Here 
there is no contradiction with Proposition 3.3 because 2 1 13 ^ L([3]). 

Under a minor additional condition the property (P2) also holds for binary L- 
cumulants. 

Proposition 3.5. Suppose that for every i G [n] the split i\([n] \ i) lies in L. Let 
X = X + a, where a £ R n and, for every I C [n], by £1 denote the corresponding 
L-cumulant of the subvector Xj. Then £i — £i + ai for all i = 1, . . . ,n and £1 = £j 
for any I C [m] such that \I\ > 2. 

Proof. This follows from Proposition 5.4. □ □ 

Define central binary L-cumulants by replacing moments [1b in (11) by central 
moments p! B . For every / C [n] the corresponding central binary L-cumulant is 
denoted by £\. 

Lemma 3.6. Under the assumptions of Proposition 3.5 we have £\ — £1 for every 
I C [n] such that \I\>2. 

Proof. Central binary L-cumulants of X can be alternatively defined as binary 
L-cumulants of X, where Xi = Xi — KXi. The lemma follows from Proposition 3.5. 
□ □ 

In the next section we show how all these ideas can be applied in algebraic 
geometry. 

3.2. Geometric applications. We consider algebraic varieties in either the real 
space M 2 — ]R 2x "' x2 or its complexification C 2 = C 2x "' x2 , or projectivization 
p2 n -i _ p^2x-x2^_ Eac]:l com p 0nen t C 2 (or R 2 ) has basis e ,ei so that e it ® 
■ • • ® e$ n corresponds to / C [n] for ij = 1 if j G / and ij = otherwise. For 
example, if n — 2 and /i G C 2x2 then we write \i in our basis as 

A* = A*0 e o ® e + Hi e\ ® e + ^2 e e.\ + H12 &i <S) e\. 

Formula (11) gives an isomorphism of the affine subspace /ig = 1 in M. 2 " (or C 2 ), 
which forms a Zariski open subset of P 2 ~ 1 . The inverse map is computed in a 
more general case in (21). 

We first show that some basic operations on the random vector X can encode 
interesting actions on the space of 2 x • • • x 2 tensors. Define X such that = X1X1 
for A, G C \ {0} for i = 1, . . . , n. Multiplying each Xi by A, results in the change of 
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moments from /i/ to /i/ = Yiiei ^il 1 ! ano - hence it corresponds to the action of the 
group D n , where D a group of diagonal matrices of the form 

I I forAeC\{0}. 

Because L-cumulants are multilinear functions of the moments we conclude that 
this action is the same on the level of L-cumulants. We have £j = Yiiei f° r 
every / C [n] . 

Suppose now that A = X + 6, for b = (pi, ... , b n ) G C", and consider the group 
U(2) n where U(2) is the unipotent group of 2 x 2-matrices of the form 

1 1 t > >n 
^ 1 for A e C. 

Adding b to the vector X corresponds to the action of U(2) n , with A^ = bi for 
i = 1, . . . , n, on the space of moments. We illustrate this with an example that 
easily generalizes. 

Example 3.7. Let n = 2 and denote by Jl = [jli] the moments of the vector 
X = X + b. We have jug = 1, Jli = E(Xj + bi) = fa + bi for i = 1, 2 and 

j5l2 := E[(Xl + &i)(X 2 + 62)] = M12 + &1M2 + M1&2 + &1&2- 

Write /! = [/ij] G C 2x2 : 

= e ® e + /iiei (g) e + /^2eo <8> ei + ^i 2 ei (g) ei. 

After applying the action of C/(2) 2 with A^ = 6, for i = 1, 2 we obtain 

ju = (e + 61 ei) ® (e + 6 2 ei) + Mi e i ® ( e o + ^2^1) + M2(eo + Mi) ® e i + Mia^i ® e i 
= e ® e + (/ii + 6i)ei <8> e + (^2 + &2)eo ® &i + (/J>i2 + &1M2 + M1&2 + bib 2 )ei ® ei 
= e ® e + juxei <8> e + pt 2 e <& ei + /ii 2 ei <g> e\, 

which confirms that translating X by b G K™ corresponds to the action of U(2) n 
on jit. 

For every J C [n], denote by lj the L-cumulant of A/. By Proposition 3.5, 
whenever every split lies in L, this complicated transformation of moments 

induced by U(2) n translates to a very simple transformation of cumulants. We have 
li = li + bi for i G [n] and £1 = ij for all / C [n] such that | J| > 2 and hence all 
the higher order L-cumulants are invariant with respect to the action of U (2)™ on 
the space of moments. 

Changing values of the binary variables X{ from 0, 1 to dj, means defining a 
new random vector A such that X{ — (a, — bi)Xi + bi. We have just shown that 
changing values of the components of A corresponds to a natural action of the 
n-dimensional torus (C*) n with coordinates Oj — bi on the space C 2 ~ n ~ l whose 
coordinates are the higher order L-cumulants £j, \I\ > 2. More specifically the 
L-cumulants of A, such that A, = (<Zj — bi)Xi + bi, are transformed by 

£1 = £ l ■ JJ(oj - bi) for all J C [n] and |/| > 2 
and £j = (a* — + 6,- for i = 1, . . . , 71. This leads to the following result. 
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Theorem 3.8. Suppose that for every i € [n] the split i\([n] \ i) lies in L. Then a 
subvariety of C 2 _1 is invariant under changing values of components of X if and 
only it is defined by Z" -homogeneous polynomials in ij with \I\ > 2. 

Proof. See the proof of [23, Theorem 3.1]. □ □ 

Note that if a variety is invariant under the action of the special linear group 
SL(2) n then in particular it is invariant under U(2) n . 

Corollary 3.9. Suppose that L is a partition lattice of [n] such that for every 
i € [n] the split i\([n] \ i) lies in L. Let V be a subvariety of the affine open subset 
given by — 1 in the projective space P(C 2x "' x2 ) and let V denote its closure in 
that projective space. If V is invariant under the action of SL(2) n then the ideal 
Iy that defines V is generated by Z" -homogeneous polynomials in the L-cumulants 
£l with \I\ > 2. 

Another important reason why L-cumulants may be useful, apart from their in- 
variance properties, is related to property (PI). Denote by Seg((P 1 )™) the Segre 
variety, which is an embedding of (P 1 )™ into P 2 _1 . In statistics the Segre variety 
corresponds to the full independence model XiAL ■ ■ ■ ALX n . In particular Propo- 
sition 3.3 implies that the image of Seg((P 1 )") in the space given by L-cumulants 
is an affine subspace given by £j = for all |/| > 2 (see also [23, Remark 3.4]). 
Moreover, L-cumulants seem to be helpful also in the analysis of other algebraic 
varieties related to the Segre variety Seg((P 1 ) ra ). For example the tangential variety 
Tan((P 1 )™) is toric when expressed in cumulants (see [23, Theorem 4.1]). In the 
following section we show how L-cumulants defined by a tree partition lattice can 
help to analyze the secant variety Sec((P 1 )"). 

3.3. Binary tree cumulants for secant varieties. In [27] we defined tree cu- 
mulants, which gave a better understanding of certain statistical models related 
to trees. We write more on that in Section 6. In this section we show how tree 
cumulants can be used to study secant varieties. Recall from Definition 2.2 that, 
for a fixed tree T with n- leaves, 7 -t ([tt.]) denotes the lattice of tree partitions of 
[n] induced by T. Moreover, 7^(1) is the lattice of all tree partitions of I induced 
by T(I), which is the smallest subtree of T containing all leaves in I. The tree 
cumulant of the subvector Xi for every / C [n] is denoted by tj. Tree cumulants 
are L-cumulants and hence defined by (11): 

(13) tj = m ( 7r ) II /X - B ' for all I C [n]. 

7rGT T (/) -BGtt 

Remark 3.10. In [27, Section 3.2] binary tree cumulants were defined in terms of 
central moments by 

tj = m M II for all I C [ n ], \I\ > 2, 

ir£T T (I) Sew 

and ii — /ij for i 6 [n]. In particular t/ for all |LJ > 2 is just the corresponding 
central L-cumulant. Let i 6 [n] be one of the leaves. Removing the edge incident 
with i induces a split i\([n] \ i) and hence the assumption of Proposition 3.5 holds 
and, by Lemma 3.6, it follows that lj = tf for all I Q [n]. In particular, both the 
definition in [27] and the one given in (13) are equivalent. 
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Let L be the lattice of tree partitions induced by the caterpillar tree in Figure 1. 
For example if n = 4 then the induced lattice is given in Figure 2. We first show how 
to compute L-cumulants [tj] without computing the Mobius function on the lattice 
L. By Remark 3.10 we can replace moments by central moments in the formula 
for tj for all / C [n] such that |/| > 2. This is very convenient because Ilse^ Ms 
is zero whenever tt contains a singleton block. Note that the elements of L with 
no singleton blocks correspond to all interval partitions with no singleton blocks. 
If n = 4 then the elements of L with no singleton blocks are the two boldfaced 
elements in Figure 2. This gives that for all / C [n] such that |/| > 2: 

«W Tln'ii = E m M II Vb- 

ttGL(I) BGtt Trel(I) BG-rr 

Both sums above are over all partitions in a poset of all interval partitions with 
no singleton blocks. Hence, both Mobius functions constrained to this poset need 
to coincide. The gain is that we already computed the Mobius function on the 
right-hand side explicitly obtaining m(7r) = (— l)! 71 "! -1 (see the end of Section 2). 

This allows us to write the map from moments [/i/] to tree cumulants [tj] of the 
caterpillar tree as a composition of two maps: from moments to central moments 
and from central moments to tree cumulants induced by the caterpillar tree. We 
will show in the end of Section 5.1 that the first map can be written as 

-1)' AB W II W fOT aU 1 = N: \I\ > 2 : 

fei\s 

and we have just shown that the second map is given by U — fj,i for i = 1, . . . , n, 
and 

Ttei(i) BG7T 
In particular, if n = 4 then tj = fi'j for all 2 < 1 1\ < 3 and 

tl234 = Ml234 — Ml2M34- 



E(- 

BCI 



1 



Figure 1. A caterpillar tree with n leaves/legs. 



We use this new coordinate system to study the secant variety Sec((P 1 )™). As 
an example consider the case when n = 4. 

Example 3.11. The secant variety Sec((P 1 ) 4 ) is a projective variety in P 15 parametrized 
by 9 copies of P 1 with coordinates (to, t), (aoi, a>i) and (&oi> h) for i = 1,2, 3, 4. The 
parametrization is given by 

n^II 6 * for all I C [4], 

where I c denotes the complement of I in {1,2,3,4} and /i = [/xj] denotes the 
coordinates of the projective space P 15 . We want to describe the image of an 
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1234 




1|234 2|134 12|34 124|3 123|4 




1|2|34 1|3|24 1|4|23 14|2|3 13|2|4 12|3|4 




1|2|3|4 

Figure 2. The Hasse diagram of the lattice of tree partitions in- 
duced by the tree in Figure 1 if n = 4. 

open subset of the parameter space given by a oi = b M = 1 for i G {1,2,3,4} and 
to = 1 — t. This image is described by 

(14) m = (l-t)JJoi + tjjfej 

iei iei 

and in particular /xg = 1. 

Earlier in this section we explained how to compute [tj] from moments as a 
composition of two simple maps. From this we can also compute the induced 
parametrization directly. Here we will show an alternative way of proceeding for the 
secant variety Sec((P 1 ) 4 ) to present some other available techniques. First, use the 
parametrization of the secant in terms of classical cumulants. This parametrization 
was given in [23, Equations (18) and (19)], which implies that for every i < j < k 

hj = t(l — t)(bi — cn)(bj — dj) 

(15) kijk = t(l - t)(l - 2t)(h - Oi)(6j - aj)(bk - ah) 
fci234 = t(l-t)(6t 2 -6t + l)YlLi(h-ai). 

Now we change coordinates from cumulants to binary tree cumulants [tj] using 
Proposition 4.3. In particular, as explained in Example 4.4, since 13|24 and 14|23 
are the only partitions in II([4]) which are not tree partitions of the caterpillar tree 
in Figure 1 for n = 4, this yields 

(16) tl234 = &1234 + ^13^24 + &14&23 

and t/ = ki for all / C [4] such that |7| < 3. From this it follows that for every 
I C {1,2,3,4} such that |/| > 2: 

(17) tj = t(l-t)(l~2tp- 2 Y[(b i -a t ), 

i£l 

which for ti234 can be verified by direct computations. Now we can immediately 
check that 

t/u./t/'uj' — t/uj't/'yj = 

holds on Sec((P 1 ) 4 ) for all distinct 7, 7' G {{«}, {j}, {hj}} and J, J' G {{k}, {I}, {k, I}} 
and every split ij\kl of {1,2,3,4}. For example 12| 34 leads to a set of equations 
including t 13 t 2 4 - ii4^23 = and ti 23 4li3 - ti23ti34 = 0. 
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Figure 3. A 4-star tree. 



This simple example can be generalized using the link between the secant vari- 
eties and certain statistical models (see [3, Section 4.1]). Define for any two disjoint 
A, C C [n] the conditional probability of X A given Xc as: 

, I \ PAUC(XA,X C ) , , w 

Pile RFC := 7 — \ for all x c 6 X c s.t. pc{xc) T 0. 

Pc(£c) 

For any function / of Xa define the conditional expectation of /(Xa) given Xc as 
a function of Xc given for any xc € Xc 

E[f(X A )\X c = x c ] = Pa|cM*c)/(za). 

We denote this conditional expectation by E[J (A^I^Tc]. If /(Xa) — Yl ieA Xi 
then we simply write \jP a and /i A (xc) = E[Y\ ieA Xi\Xc — xc}- Note that pfj^ is a 
random variable itself. 

Similarly as in the case of Lemma 1.1 we can show that for disjoint C, B\, . . , , B r C 
[n] the Xg.'s are jointly independent given Xc if 

r 

l i A 1 -A r = for all Ai C Bi, i = 1,. . . ,r, 

In this case the marginal distribution of satisfies 

(18) VAx-Ar = H^M-A,,} = Pc( x c)^A 1 ...A r ( x c)- 

xc£Xc 

For a statistician the parametrization in (14) corresponds to the parametrization 
of moments of the binary 4-star tree model (naive Bayes model) as given in Figure 
3. The leaves of this tree correspond to a vector X — [X\,X2,X^,X^) of binary 
observed variables and the inner node corresponds to a binary variable Y which is 
not observed. This model contains all possible moments of a binary vector X such 
that all components of X are jointly independent given Y. The parametrization in 
(14) is a special version of (18). 

The fact that (14) can be rewritten in the easier form in (17) for any n > 4 
follows from more general considerations in [27, Section 4]. We obtain the following 
procedure: 

1. Consider any trivalent tree with n leaves, that is a tree such that each inner 
node has valency three. 

2. Compute tree cumulants induced by this trivalent tree. 

3. The induced parametrization of the n-star tree model in the coordinate 
system constructed in step 2 is (17), where now / C [n] for n > 4. For more 
details check Section 6. 



L-CUMULANTS, L-CUMULANT EMBEDDINGS AND ALGEBRAIC STATISTICS 15 

Of course, since we can pick any trivalent tree in step 1, the most natural choice 
is to pick the caterpillar tree. This is mainly because the computation of the 
corresponding tree cumulants is simple as it was presented earlier in this section. 
Now from the parametrization in (17) we easily verify that 

(19) tiujirur - Wt/'u./ = 

holds on Sec((P 1 )™) for all non-empty subsets 1,1' C A and J,J'CB where A\B 
is a split of [n]. 

Remark 3.12. It may seem that a more natural way to proceed in Example 3.11 was 
to construct tree cumulants induced directly by partitions of the 4-star tree in Figure 
3. The tree partitions of the 4-star tree are equal to one-cluster partitions from 
Definition 2.2. By Proposition 5.5 this partition lattice induces central moments 
f/ T . To compute the induced parametrization of the central moments note that 
f/j = kj for all 2 < |J| < 3. A direct check shows that 

4 

M'i234 = *(1 - *)(3* 2 - 3i + l)Y[(bi - Oi) 

1=1 

and we find that the relation between /l/ 12 34 and other central moments is more 
complicated than in the case of tree cumulants induced by the caterpillar tree. In 
particular, the corresponding equations are no longer binomial like in (19). 

4. The definition of L-cumulants 

Let A = {ix, . . . , id} be a multiset. We define its multisubset B C A as a multiset 
B = {ij : j € 7} for some I C [<j]. For example if A — {1, 1, 2, 2} then A has, among 
others, four multisubsets of the form {1,2}. Let X be a finite discrete random vector 
with values in X and let A(X) be the family of multisets associated to X as given 
in (4). Consider any family L = {L{A)) AeA i X \ of partition lattices such that L(A) 
is a subposet of 11(A) for every A E A(X). Assume that the maximal and minimal 
elements of L(A) coincide with the maximal and the minimal element of 11(A) 
and denote them by A and Oa respectively. Moreover, for every B C A the map 
L(A) — > L(B) are surjections given by constraining partitions of A to B. Note that 
in particular, L(A) need not be a sublattice of 11(A) because the join and the meet 
operator of L{A) and 11(A) may differ. 

The first two trivial examples of a family L as above is II = (11(A)) AeA(x) an d 
L such that for every A e A(X), |A| > 2, the lattice L{A) is given by just two 
elements 0,4 and A. Other interesting examples are obtained from Definition 2.2 
(excluding tree partitions), where L(A) is assumed to be isomorphic to L(|A|). The 
corresponding families of lattices are denoted by NC (non-crossing), I (interval) and 
C (one-cluster). A definition of tree cumulants in this case requires construction 
of an A-labelled tree, where A is the maximal multiset A in A(X) corresponding 
to x = (ri — 1, . . . , r n — 1). This construction is not unique and for that reason we 
discuss tree cumulants only in very concrete examples. 

By rriyi we denote the Mobius function on L(A). The lattice will be always 
obvious from the context so we omit it in the notation. When A is also clear from 
the context we just write m. 
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Definition 4.1 (L-cumulants). Let X = (Xi, . . . , X n ) be a random vector. For 
any A e A(X) and v e L(A) define 

(20) l{v) = m A (n,u)n(n), 

7r<f 

where [i(tt) — YIbett Ms- Then £a ■— £{A) is the L-cumulant of Xa- 

If L = II then, because 11(A) ~ II([|A|]), we obtain the formula in (3) and 
hence this definition generalizes the classical cumulants. Other known L-cumulants 
were defined in the non-commutative probability literature. These are L-cumulants 
defined by NC and I, which are called free cumulants and Boolean cumulants 
respectively (see [20, 21]). 

The map (20) is invertible with the inverse given by the Mobius inversion formula 
in Proposition 2.4. Thus for every A € A(X) 

(21) fi A = ti(A) = J2 £ W" 

TrEL(A) 

Note that in general £(tt) ^ Y[bett^b, a s it was the case for cumulants. However, 
•^M = YIbett^b whenever L satisfies the following condition: 

(CO) For every A 6 A(X) and for any two partitions tt, v £ L(A) the in- 
terval [n, v\ is isomorphic to the product of intervals Y\ Bqv \t{B),B] C 

Condition (CO) is not very restrictive. In fact all partition lattices mentioned in 
Definition 2.2 satisfy this property. If (CO) holds, then, by Proposition 2.3, the 
Mobius function on L(A) satisfies rru^, v) = YIbev m B{^(B), B). In particular 
(21) becomes 

i i a — n 

and the proof of this follows essentially the proof of Lemma 3.1. 

Remark 4.2. By the moment aliasing there is a one-to-one correspondence between 
the probabilities P — [p(x)] xe x and moments M = [nA]AeA(x) an d hence also 
L-cumulants C = [£a]a£A(x)- 

Unlike in the case of cumulants, for general L-cumulants no generating function 
is known. It may be then useful to realize that L-cumulants can be expressed 
in terms of classical cumulants in a rather simple manner. The following result 
generalizes Theorem 4.1 in [ ]. 

Proposition 4.3. Let L(A) be a lattice of set partitions of A in the family L and 
let II* denote the set of elements tt € 11(A) such that [it, A] n L(A) = {A}, where 
the interval [ft, A] is taken in 11(A). We have 

e A = J2 *M = E II k B- 

Proof. In this proof 5 <n tt means that S < tt and S £ 11(A). Similarly tt >l 5 
denotes tt > S and tt 6 L(A). Expressing the L-cumulant in terms of moments and 
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then the moments in terms of classical cumulants gives 

tA = e «*o n ( e n fc -) - 
= e <*« e n **• 

ttGL(A) <5< n 7rSe<5 

For every S G n(A) let 5 denote the smallest element of L(A) such that 5 <n S. 
Then, by changing the order of summation, the above equation can be rewritten as 

ia = E n**( E »(*)]■ 

5en(A)BG«5 \7r> £ 5 / 

By (7) the sum in brackets vanishes whenever <5 ^ A. Therefore the whole expres- 
sion is equal to J2sen* Uses k B- □ □ 

Example 4.4. Let n = 4 and let L be the lattice of all set partitions in Figure 
2. The only partitions of n([4]) which are not in L are 13 124 and 14|23. Hence, 
they are also the only partitions satisfying the condition [tt, [4]] n L = [4]. This, by 
Proposition 4.3, gives the formula for ti234 given in (16). 



5. Basic properties of L-cumulants 

In this section we show that L-cumulants satisfy properties similar to (P1)-(P4). 
The following lemma is central to most of the proofs of this section. It was first 
formulated by Weisner [24] in a special case and then generalized by Rota [13] for 
general lattices (see the corollary on page 351 therein). 

Lemma 5.1. Let L be a finite lattice with at least two elements, and let ir G L be 
such that 7To 7^ 1 • Then for any S G L 

E m M = °- 

7r:7rA7ro— 8 

A special case of this result, when 8 = 0, is given in [22, Corollary 3.9.3]. It is 
a useful exercise to see that the proof given there generalizes to provide a proof of 
Lemma 5.1. 

5.1. Independence and semi-invariance. To show that property (PI) holds for 
L-cumulants we first prove a more algebraic version of this result. This result is 
directly linked to the definition of independence formulated in terms of moments 
in (5). 

Theorem 5.2. Consider the L-cumulant of X = (X\, . . . , X n ) as in Definition 4-1- 
The following are equivalent: 

(i) There exists a partition ttq G L such that ttq ^ [n] and for every n G L we 
have that (J,(n) = Att ), 

(ii) iii = h(ttq (I)) for all I C [n], 

(iii) £(ir) = for all tt £ ir , 

(iv) £j = unless I is contained in a single block of ttq . 



18 



PIOTR ZWIERNIK 




Proof. The equivalence of (i) and (ii) follows from the fact that n(ir) is a mul- 
tiplicative function of L. Hence (i)=>(ii) follows by taking it — [n] and then con- 
straining to elements of /. The opposite implication follows by taking I to be blocks 
of 7r. We now prove (i)=>(iii). Using Definition 4.1 we obtain 

t(v) = ^ m(7r, j/)^(tt) = m(7r, i/)^(tt A vr ) = 



m(7r,i/)J n(5), 

where the inner sum in the last expression is over all it in [0, v\ such that n Attq — 5 
(or 7r A (ttq A v) = 5). To show (hi), we are interested only in v ttq and hence 
we can assume that v ^ 0. The interval [0, v\ C L is a lattice with at least two 
elements, and, whenever v ttq, also itq A v ^ v. Therefore, by Lemma 5.1 for all 
S < v A 7r the sum Yin An =6 m ( 7r i v ) vanishes. Hence £(v) — unless v < tt . 
To show (iii)=>(i) note that if 1(8) = for all 8 ^ ttq then for every ir E L 

M (7r) = - Yl £ W = M^tto). 

<5<7T <5<7rA7To 

To see that (iv) follows from (i) and (hi), apply (i) with L(I) in place of L. If 
I is not contained in a block of ttq then 7Tq(7) is not the maximal element of L(I) 
and by (i) this gives — \i(-k A ttq(I)) for every 7r e Now £j = by (hi). 

Finally we show that (iv) implies (ii) using induction with respect to |/|. If 
I = {i,j} such that i and j lie in different blocks of ir then tvq(I) = Since 
i%j — (J>ij — (J-ifij — 0, (ii) holds if |7| = 2. Suppose now that (ii) holds for all 
|/| < d and let now I C [n] be such that |7| = d and ttq(I) ^ I (otherwise (ii) holds 
trivially). By (20) we have 

e i = E m ( 7r )^( 7r )- 

n£L(I) 

If 7T < / then is a product of some /zs, where \B\ < d and hence by assumption 
/i(7r) = fj,(jT A 7r (/)). We can rewrite the above equation as 

(22) ij = w -/x(7T (i))+ E m(7r)/i(7rA7r (/)). 

7rei(7) 



m(7r) //(£), 



The last summand can be rewritten as 

E [ E 

(5<7T (-0 7rA7r (I)=<5 

which is zero by Lemma 5.1 because tvq(I) ^ I. Therefore, (22) becomes £j = 
/ij — ^(ttq(I)). Since = by assumption, we obtain that (ii) holds for |/| = d 
and hence it holds for all I C [n]. □ □ 

This result gives an immediate corollary which generalizes property (PI) of the 
classical cumulants. 

Proposition 5.3. Suppose there exists a partition ttq G L such that JLBen 

X B . 

Then £(tt) = for all tt ^ ttq or equivalently £j± — unless all the elements of A 
are contained in a single block of n . 
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This proposition shows one of the important features of i-cumulants. For cu- 
mulants, by (PI), all marginal independencies imply that k\... n = 0. In the case of 
L-cumulants only some of the independencies imply vanishing (see Example 3.4). 
Hence, this new coordinate system can be designed to better fit the model under 
consideration. This concept will be explained in more detail for tree cumulants in 
Section 6. 

We formulate an additional condition on the family of lattices L, which we require 
to hold only when this is explicitly stated. 

(CI) For every A £ A(X) and every i £ A the split i\(A \ i) is in L(A). 

Among the partitions in Definition 2.2 only the lattice of interval partitions does 
not satisfy (CI). 

Proposition 5.4 (Semi-invariance). Let L satisfy (CI) and X = X + a, where a £ 
W 1 is any constant vector. Denote by Ia the L-cumulant of Xa- Then £i — + aj 
for all i = 1, . . . ,n and Ia — (a for any multiset A £ A{X) such that \A\ > 2. 

Proof. Without loss of generality assume A — [n]. Since a — J2 a i e ii where the 
e^'s are the unit vectors in M. n , it suffices to prove this result only in the case when 
a is such that a\ is the only non-zero entry. In this case write X\ — X\ + ai as 
X\ — \i\ + (a\ + /ii), where /ii = EXi and a\ + fi± = EX\. Hence, if the split 
tto = 1|{2, . . . n} £ L then for every ir £ L, 

J1{ti) = - /i(7r A 7r ) + ju(7T A n ). 

It follows that 

(23) *l-n = E 7r ei m ( 7r )/ i (^)- 

- E^rei m(7r)M(7r A tt ) + J^^eL ^W^^ A tt ). 

Since L is a lattice and 7r ^ [n], by Lemma 5.1 we have that ^2^ A7Vo=l/ m(7r) = 
for each v £ L and hence the second and third summand in (23) are zero. The 
proof is completed because the first summand is exactly t\... n . □ □ 

The following result shows that the central moments are L-cumulants induced 
by the lattice of one-cluster partitions C([n]). 

Proposition 5.5. Let X be a random vector with values in X . Then the central 
moments \j! a for \A\ > 2 are equal to the corresponding L-cumulants induced by 
C = (C(A)) A eA( X) - 

Proof. Denote by c the L-cumulants induced by the family C of one-cluster 
partition lattices. Let A £ A(X) be such that \A\ > 2. Since every split of the form 
i\(A \ i) is a one-cluster partition, by Proposition 5.4, we can write c in terms of 
the central moments 

c A = yi m ( 7r ) n f ° r ai1 i^i - 2 - 

However, fi[ — for every i £ [n] and hence the only non-zero term of the above 
sum is where n = A, which proves that = \x' A - D D 

The correspondence between the lattice of one-cluster partitions and central mo- 
ments gives also the following explicit, simple and computationally efficient formula 
for central moments in terms of moments. 
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Lemma 5.6. Let X be a random vector with values in X. For every A € A(X) 
such that \A\ > 2 we have: 

(24) ii A = ^(-1)1^1^ n 

BCA i£A\B 

Proof. Use (8) and Proposition 5.5 to write 

/4= e (-i) w - i n^+(- i ) |Ai " i d A i- i )n^ 

6<ireC(A) Beir i£A 

Let Bq be the distinguished non-singleton block in each of the product Ilse^ ^b 
above. Then | tt | — 1 = \A \ Bq\. Hence, every Y\ BeiT ^B corresponds to some 
A* So riieAXBo Mi m (24) with the same coefficient. The remaining part is to check 
that the coefficient of JlieA Mi i s a l so the same, but this is an easy check. □ □ 

Example 5.7. Let A = {1, 1, 2, 2} and list all multisubscts of A as defined in the 
beginning of Section 4. We easily check that 

M1122 = M1122 - 2^1/1122 - 2^112 + M11M2 + 4^i2^iM2 + ^1^22 - 3^/i|, 
which can be verified also by hand. 

5.2. Multilinear transformations. By property (P3) cumulants behave nicely 
under multilinear transformations. In this section, to study similar properties for 
general L-cumulants, we restrict to L satisfying the following condition. 

(C2) For every A G A{X) the lattice L(A) is isomorphic to £([d]), where d = \A\. 
This property is satisfied by construction for II, I, NC, and C. If (C2) holds then, 
for every d-tuple (ii, . . . , 14) € [n] d we define as a n x ■ ■ • x n tensor of the form 

(25) 41 itJ = £ m(7r)n«B. 

7r£L([d]) BEtt 

Note that in general £^}.. id may differ from £i 1 ...i d . For example if L = I([3]) then 
^213 = ^123 because the definition of L-cumulants does not depend on the ordering 
of the elements in [n] . On the other hand, we have C^ 3 ^ £-213 because 

£l23 = Ml23 - MlM23 ~ ^12M3 + MlM2^3 

and 

£213 = M123 - M2M13 - M12M3 + ^1^2/^3- 

The following proposition shows that the tensor L^ d \ for any d > 1, under linear 
mappings transforms as a contravariant tensor. 

Proposition 5.8. Let X = {X\, . . . ,X n ) be a random vector. Consider L- cumulants 
defined by L satisfying (C2). Let Q = [q^] e R mxn and I = QIe R m . Define 
[J1a]> [&a] and as counterparts of [ha], [^a] and for X accordingly. Then 
for each d > 1. 

where Q • is the multilinear action on a d- dimensional tensor defined by 

n n 

(26) (Q ■ £«>),,..,„ = £-E • • • QujAtu 
for each d > 1 and ii, ...,i<2 G [m]. 
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Proof. By (25) we have 

n n I 

(q ■ c {d) ) tl ... ld = ■ " ■ ■ ■ x m ( ?r ) n ^ 

31=1 Jd=l \7reL([d]) BGvr 

Write /ij B explicitly as E [Jibes -^3'i,] • Then, using (C2), after changing the ordering 
of products and summations we obtain 



(Q-£W) h ... id = £ mWjjE 

7rei([d]) SGtt 

Since A\ = I]" b =i Qi b jb X ] b we obtain 



II(E«wA) 

6GB 36=1 



7rGi([d]) BStt 

which finishes the proof. □ □ 

Although for some L the property (P3) may not hold, the homogeneity holds for 
all i-cumulants. Thus, if X = (XiXi, . . . , X n X n ) for some A = (Ai, . . . , A„) € (K*)™ 
then £ A = J[ ieA ^A for every A e A(X). 

5.3. Conditional L-cumulants. Suppose we are given the conditional cumulants 
of X = [X\, . . . , X n ) conditional on some random variable Y and we want to obtain 
the unconditional cumulants. This is a common problem with hidden variable 
models. On the level of moments this relationship is straightforward since 



fj-A = E[JJjfj] = E[i[JJXi|y] 



ieA ieA 

for every multiset A. For cumulants, or more generally for L-cumulants, the situa- 
tion is a bit more complicated. 

For every multiset A 6 A(X) denote by k\ the conditional cumulant of Xa given 
Y, that is a cumulant computed as in Definition 4.1 but with moments replaced 
by conditional moments. Note that each k A is itself a random variable. For any 
7r £ 11(A), by kir denote the cumulant of the random vector (fcg)s e7r . It is known 
from [ ] that for every A £ A(X): 

(27) k A = J2 k - 

7rGi(A) 

This in particular generalizes the well-known formula 

Cov(X,Z) = E[Cav(X,Z\Y)] + Cav(E[X\Y],E[Z\Y]). 

In Theorem 5.9 we give a purely combinatorial proof of (27). For our purposes it 
is slightly more constructive than a similar proof of the same result in [19]. Also it 
immediately enables us to formulate this result for L-cumulants in the case when 
L satisfies the following property. 

(C3) For every n > and each tt 6 L the interval [tt, [n]] C L is isomorphic to 
L(\M]).' 
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This property is satisfied for II (see [22, Example 3.10.4]). A sufficient condition 
for L to satisfy (C3) is that for every n > the lattice L forms a join subsemilattice 
of II([n]). Therefore, I as well as the lattice of tree partitions for sufficiently regular 
trees (for example caterpillars) both satisfy the property. Condition (C3) does not 
hold however for the lattice of one-cluster partitions, (general) tree partitions and 
non-crossing partitions. 

For every multiset A € A(X) denote by £^ the conditional L-cumulant of 
given Y. For any 7r € L(A), by £„ denote the L-cumulant of the random vector 
Be-w- 



Theorem 5.9 (Brillinger's formula for L-cumulants). Let X = (Xi, 
random vector and Y be a random variable. IfL* satisfies (CS) then 



,X n ) be a 



5> 

7TGL 



Proof. Since \ib 
(28) 



E/i^, by (21) we obtain the identity 

sel(b) ces 

Using (20) and replacing (28) for each /is we can write 
= E ffei m( 7 r)E 5 < 7r n Be .E[n 



fY 



where 8(B) denotes the partition 8 e L constrained to B E it. We change the order 
of summation to obtain 




E-MiM n 



(29) 

ceS{B) 

For each S = C\\ ■ ■ ■ \C r € L denote the set of its blocks by Ms = {C\, . . . , C r }. By 
(C3) the interval [8, [n]] is isomorphic to L(Ms) which is isomorphic to L([|<5|]) and 
hence the expression in brackets in (29) can be rewritten as 

Beu ceB 



which by definition is equal to t$. 



□ 



□ 



If (C3) does not hold and we want to perform some efficient conditional com- 
putations, we can still use the classical Brillinger's formula for cumulants and then 
translate them back to L-cumulants using Proposition 4.3. Moreover, for some spe- 
cial statistical models the following result may be useful. It works for all families 
L. 



Proposition 5.10. Let X = (X\, 
variable. Lf XiJL ■ ■ ■ JLX n \Y. then 

il...n 



,X n ) be a random vector and Y a random 



'1|2|— |ni 



where by definition £\\2\... |„ is the L-cumulant of the random vector (t\ , . 



1 c n 1 
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Proof. Since X 1 AL ■ ■ ■ MX n \Y, by Proposition 5.3, i Y c = unless |C| = 1. More- 
over, we have 

(1 B = E[J]/if]. 

Using (20) and replacing the above identity for each \xb we can write 

But since ^ = /i^, the right hand side in the above equation is exactly the L- 
cumulant of the random vector (ij , ...,£„). □ □ 

To see how this result may be relevant in geometry see Example 3.11. 

6. Tree cumulants and hidden Markov processes 

In this section we complement the discussion of tree cumulants and show how 
they can be used to analyze more general processes on trees. 

6.1. Tree models. Let T r be a rooted tree with vertex set V and edge set E, 
that is a tree with one distinguished node r € V called the root and all the edges 
directed away from r. Let X = (X v ) ve v be a vector of binary random variables 
with values and 1. Consider the Bayesian network for X represented by T r . Each 
node v corresponds to a random variable X v and the structure of T r imposes some 
constraints on the joint distribution of X (see for example [7]). Define A4t as the 
model obtained from this Bayesian network by taking the marginal distributions 
over the leaves of T r . We call Mt the two-state general Markov model (for example 
[IT, Chapter 8]). We omit the rooting in the notation because the model does not 
depend on the rooting. In other words, for any alternative rooting the induced 
parametrization will lead to the same model. 

The parametric formulation of the model is obtained by expressing the marginal 
distribution of X over the leaves of T r in terms of the marginal distribution of 
the root r and conditional distributions of each v € V \ {r} given its parent in T r 
denoted by pa(w). Assume that T r has n leaves and label them by elements of [n]. 
The distribution over the set of leaves satisfies 

(30) p(xi,...,x n ) = ^2p r (x r ) 'Yl p v \ 

H veV\r 

where H. is the set of all x € {0, 1} V such that the restriction to the leaves of T is 
equal to [x\, . . . ,x n ). The model is given as the image of (30) in A^, where each 
point corresponds to a different choice of values for conditional probabilities on the 
right hand side of this parametrization. If m denotes the number of inner nodes of 
T then this parametrization has 2 m terms. For large trees this is a big polynomial 
which complicates the geometric and algebraic analysis of these models. 

The two-state general Markov model can be equivalently defined by a set of con- 
ditional independence statements. This follows from the general theory of graphical 
models (see [7, Section 3.2.2]). We say that two disjoint subsets A, B of the set of 
vertices V of T are separated by another subset C if every undirected path from a 
node in A to a node in B necessarily crosses C. The set of all conditional indepen- 
dence statements which define the general Markov model are given by all AALB\C 
for all disjoint subsets A.B.C QV such that C separates A and B. For example 
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the 4-star tree model discussed in Section 3.3 is denned by XiALX 2 -lLX3ALX4\Y 
because the inner node separates all the leaves from each other. 

Before we recall the main result of [27], let us give some intuition on why tree 
cumulants may be helpful in the study of tree models. Suppose that for some 
edge (u,v) in T r we impose on the model A4t that in addition X U JLX V . This 
corresponds to removing the edge (u, v) from T r and considering the model of the 
induced forest. Let A\B be the split of the set of leaves [n] induced by removing the 
edge (u, v). Then the independence statement X U ALX V implies also that Xa-U-Xb- 

Example 6.1. Let T be the quartet tree in Figure 4 rooted in a. The independence 
(X 1 ,X 2 )AL(X 3 , X 4 ) defines a valid submodel of the tree model for T. This submodel 
is defined by requesting X a JLXb and hence it is given as the image of the subspace 
of the parameter space restricted to j>b| a (l|0) = P(,i a (l|l). 



1 3 




Figure 4. A quartet tree. 



By Proposition 3.3 there exists a tree partition ttq such that JLb£tt Xb if and 
only if tj = whenever / C [n] is not completely contained in one of the blocks of ttq. 
In Example 6.1, because 12 1 34 is a valid tree partition, the marginal independence 
(X 1 ,X 2 )AL(X 3 , X A ) holds if and only if t/ = for all I C {1, 2, 3, 4} such that / is 
not contained neither in {1, 2} nor {3, 4}. Hence all ti3, ti4, t23, i 2 A, ti34,t234,ti23,ti24 
and ti234 vanish whenever p&| a (l|0) = _Pfo| a (1 1 1) - These kind of considerations help 
to understand why tree cumulants are helpful for describing the two-state general 
Markov models. They also help to intuitively understand the result in Theorem 
6.2, which we now state formally. 

Let r\ uv =p„u(l|l) — Pt,| u (l|0) for each (u,v) € E. As we have shown r\ uv = if 
and only if X u XlX v . Moreover, let jl v — 1 — 2/i v for v € V. 

Theorem 6.2 (Zwiernik, Smith [ ]). Let T be trivalent tree. Then the two- 
state general Markov model AAt can be equivalently expressed in the space of tree 
cumulants by = /Zj = |(1 — fli) for i = 1, . . . , n; and for all \I\ > 2 

t/ = ^(!-Mr(/)) II I| 

deg(v)=3 (u,v)£E(I) 

where V(I) and E(I) denotes vertex and edge sets of the tree T{I), r(I) is the root 
ofT(I) and deg(w) denotes the valency of v inT(I). 

Example 6.3. Let T r be a quartet tree in Figure 4. Then by Theorem 6.2 we have 

for example t i2 = |(1 - fi 2 a )ValVa2, il3 = |(1 ~ fil)ValVabVb3, t34 = |(1 - ^b)%3??{.4 

and 

1. n. 

tl234 = t(1 - Ha)l J -atJ-bVlaV2a'nabVb3Vb4- 
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We also infer from this that 

for all I, /' e {{1}, {2}, {1, 2}} and J, J' e {{3}, {4}, {3, 4}}. 

Theorem 6.2 can be applied only for trivalent trees and hence it does not hold for 
n-star tree models discussed earlier (see also Remark 3.12). We can use however the 
fact that any non-trivalent tree model is a submodel of some model of a trivalent 
tree. Thus, if T is not trivalent then we take any trivalent tree T* such that T 
can be obtained from T* by edge contractions. Now the two-state general Markov 
model for T, when expressed in tree cumulants of T* , is parametrized by ti = /i; 
for i = 1, . . . , n, and for all |7| > 2 

tz = j(i -/$d) n &« v) - a n w 

In the quartet tree of Example 6.3 we can contract the edge (a, b) to obtain the 4- 
star tree in Figure 3. This contraction corresponds to the subspace of the parameter 
space given by £L a — and rj a f, = 1. This induces the parametrization of the secant 
variety given in Example 3.11. The same can be obtained for any n-star tree model 
with n > 4. For more details see [27]. 

6.2. Binary hidden Markov processes. We now show that tree cumulants can 
be useful also for other related statistical models. We consider models with an 
underlying two-state Markov chain which is not observed, where the observed vari- 
ables are independent given this Markov chain. An example is given by the hidden 
Markov model or some simple cases of Markov switching models without autore- 
gressive terms (see for example [6]). In this section we refer to all these models as 
binary hidden Markov processes. 

Consider tree cumulants induced by the caterpillar tree and define the normalized 
tree cumulants as 

tj = Yi ^ =t/ fOT aU 1 - Mi 

which is always well defined if all the variables in the system are non-degenerate (a 
degenerate random variable takes only one value with nonzero probability). With 
this definition p t j := ty is just the usual correlation between and Xj, and 
li '■= Uu is the skewness of X,. 

In this section we deal with an observed vector X — (Xi, . . . ,X n ) and a hid- 
den vector H = (Hi, . . . , H n ). Since we need to consider mixed tree cumulants 
involving indices from both vectors, we introduce the following convention. When- 
ever an index involves i referring to Hi we write it as i. Hence for example 
Kj = Cov(JT <s X,-), % = Varffli), k tl = Var(X l ), 7i = E(/f, - EiJ J ) 3 /Var(/f l ) 3/2 , 
and fjfg=E\n. ieB (.Hi-^Hi)]. 

It is well known that for every random variable X, if Y is binary, then 

(31) R(X\Y) = EX + Cov(X,Y)(V&r(Y))- 1 (Y -EY), 

where Cov(X, y)(Var(F)) _1 is the linear regression coefficient of X with respect to 
Y. The following proposition shows that the hidden Markov process has an elegant 
formulation and all its normalized tree cumulants are parametrized by correlations 
and skewnesses. 
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Proposition 6.4. Let X = (Xi, . . . , X n ) be a random vector and H = (Hi, . . . , H n ) 
a binary random vector (both non- degenerate) . Assume that XiMl . . . JLX n \H and 
the conditional distribution of X, given H depends only on Hi for i — 1, ...,n. 
Moreover, let H form a Markov chain. Then for every I = {%i, . . . ,i c i} such that 
1 < %i < ■ ■ ■ < id < n the corresponding normalized tree cumulant satisfies 

d—l id— 1 

ti = II " II /' I 

j—2 i—ii i£l 

Proof. Before we prove the proposition we formulate the following result. 

Lemma 6.5. Suppose that X = (Xi, . . . ,X n ) is a binary random vector such that 
iJLjJLC\r for some disjoint i,j,r e [n] and C C [nj. Let n r A = p! r A^rr f or every 
A C [n] and r r = k rrr k~^ . Then 

Proof. Let Ua '■= YiieA^i ~ EXi) for every AC [n]. The conditional indepen- 
dence iJLj JLC\r implies 

E[U ijC \U r } = E[Ui\U r ]K[Uj\U r ]E[Uo\U r ]. 

Using (31) for the conditional expectations on the right hand side and then taking 
expectations on both sides yields 

t^ijc — rj T iri r jk rr ^i(j ~\~rj r irj r jrj r (jk rrr . 

Replace rj r c = p! rC k~^ to obtain the formula in Lemma 6.5. □ □ 

To prove Proposition 6.4 first assume that I = [n] and by L denote the lattice 
of tree partitions of the caterpillar tree with n leaves. We can divide the partitions 
in L into two groups: 

1. partitions with 1 and 2 in two different blocks 1A and 2B, and 

2. partitions with 1 and 2 in a single block 12A 
By Remark 3.10 we can write 

(32) t!...„ = ]T m(7r) J] n' B . 

In the first group of partitions we always have either A = or B = 0. Since 
f/i = \i! 2 = 0, for every tt in the first group the corresponding summand in (32) is 
zero. Let Sq = 12 1 3 1 • • • \n. The set of all partitions in the second group forms an 
interval [S , [n\], which is isomorphic to the set of all tree partitions of the subtree 
T2 of T with n — 1 leaves given by the hidden vertex 2 and the remaining leaves of 
T: 3, . . . , n. This isomorphism is given by replacing each block 12A with a block 
2 A. Denote the lattice of all partitions of T2 by L 2 . Since [Sq, [n]] ~ L 2 , the Mobius 
function on L restricted to this interval is equal to the Mobius function on L 2 . 
For every A C [n] \ {1, 2} we have that XiALX 2 -ILXa\H2 and hence, by Lemma 

6.5 

(A.2A = m^22k22^A+miri22^2A T 2- 

Therefore, (32) becomes 

(33) ii... n = m wn A4 • m\ri22k22^A + ?721?722T2 ^ ^i 71 ) II Mb- 

Te[i5o,[n]] -BGtt v£L 2 Ben 
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Let 7To be a split 12|[n] \ {1, 2}. For every rr £ [So, [n]] the partition ir A 7To is the 
partition obtained from 7r by splitting the block 12A into two blocks 12 and A. 
With this notation the first summand in (33) can be rewritten as 

| m ( 7r )] II ■ miV-rikTi- 

vE[8 ,tt \ tt:tvAtt =v BEv 

Since the interval [So, [n]] forms a lattice then by Lemma 5.1 the above expression 
is zero. Since X^ g l 2 tn ( 7r ) IIbe* Pb = t([n]\{i,2})u{a}> then the second summand 
in (33) is 

(34) ti...„ = %l%2T2t([ n ]\{l,2})U{2}- 

Using (31) we can also prove that rjgi — feii^n^fe^ 1 ( use the fact that XiJLH 2 \Hi). 
In the next step we can apply the same procedure as above to express t(r n n {i,2})u{2} 
in (34) in terms of 7723, 7733, 7-3 and t([„]\{i,2,3})u{3}- We can do Jt recursively 
until we obtain 

n n— 1 n— 1 

i=l i=l t=2 

Divide both sides by • • • k nn . The main proposition follows for / = [n] after 
some obvious algebraic rearrangements. In the general case we first use the formula 
for [n] to conclude that for every I = {ii, . . . , i^}, where i\ < . . . < i c i 

d-l d-l 

l i = II II /' II/' 

j=2 j=\ iei 

To prove the final formula, it remains to show that 

ij+i-l 
Pij + 1 = |J[ Pi i+l , 

which can be proved by induction using (31). □ □ 

This proposition enables us to analyze the moment structure of hidden Markov 
processes. 

Example 6.6 (Homogeneous binary hidden Markov model). Consider a homoge- 
neous binary hidden Markov model. In this case H — (Hi)f =1 forms a homogeneous 
two-state Markov chain which we assume to start from its stationary distribu- 
tion. Moreover, the conditional distribution of X t given Hi is the same for every 
i = 1, . . . ,n. Under these assumptions the marginal distribution of Hi is equal 
to the marginal distribution of Hi for every i — 2, . . . , n. Let 7 be the skewness 
of Hx, p = Corr(i?i, H2) be the one step correlation of the Markov chain H, and 
b = Corr(ifi, Xi). By Proposition 6.4, for every d > 2 and 1 < i\ < . . . < id < n, 

(35) \... ld = fcV^V- 2 - 

This in turn induces some constraints on the tree cumulants of the observed vari- 
ables which may be useful to construct simple diagnostic tests for this class of 
models. For example it is easy to check that 

U(i+2)ij(j+2) = tfc(fc+3)t/(i+i) for every i,j,k,l = l,...,n 

and that UjUkijk _^ for all i < j < k. The monomial parametrization in (35) 
enables us to obtain the equations for higher order tree cumulants. 
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